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Abstract 

In the simple vs composite hypothesis test with a proper prior, the 
Bayes Factor (BF) is shown to be the posterior mean of the Likelihood 
Ratio (LR). Therefore, the posterior standard deviation of the LR or 
rather its posterior cumulative density function can be used to indicate 
the significativity of a detection by the BF and this detection procedure 
can be computed from a single Markov Chain. It is applied and compared 
for exoplanet detection. 

The previous statistics can be expressed from the Fractional BF (FBF) 
[T] and the Probability distribution of the LR (PLR) [2]. Two properties 
of the PLR related to the GLRT are noted and a procedure to optimize 
the PLR and the FBF two-parameters detectors according to their ROC 
curves is proposed. The performances of all tests are compared. 

1 Introduction 

The detection of a signal from low signal-to-noise ratio data is a general issue in 
signal processing. For a given dataset x, we express the detection as the deter- 
ministic choice among a simple (no signal: ?7o = 0) and a composite hypothesis 
test: 

Ho -V^iIq Hi -.T]^ 7r(j7) (1) 

7r(r7) 7^ S{ri) is a given proper multivariate prior describing the uncertainty and 
constraints on the intensities rj g of the signal of interest. Alternatively to 
the 0-1 decision an interesting no-decision region could have been used [2113]. 

The likelihoods have the same expression under Hq and Hi and depend on 
T] only. This means for frequentists that all other parameters are known. For 



Bayesianists, they have been marginaUzed out. 



In addition to the mere detection resuh, information about the significativity 
of the decision is in general expected. In frequentist settings it is usually given 
by the PFA or the p- value of the statistics of detection [S]. These notions are 
also studied in the Bayesian perspective |7j . However, the PFA as well as the 
p- value require an integration of the likelihood (or other Bayesian distributions) 
over a subset of the sample space and this computation may be intractable. 

In Bayesian settings, the Posterior Odds Ratio POR = p|.|^°j^j minimizes 
the Bayesian risk under the 0-1 loss function and appears as the expression of 
what is exactly looked for. It is equal to the classical Bayes Factor (BF) 8J 

gp ^ p{.x\Hq) ^ p{x\r] ^ rjo) 
p{x\Hi) J dr] p{x\r])p{'n) 

up to the multiplicative prior odds ratio pOR = Pr(iJo)Pr(iJi)~^. 

In [5, the Bayesian detector consists in thresholding the BF and giving as 
the error the posterior probability of the selected model Pr(iJji(^) |a;) because it 
gives an "intrinsic significance level" [S]. However, for i G {0, 1} 



J2^j=iPix\H,)PTiHj) 1 + (BF X pOR)i-2*o, 



where 5ij is the Kronecker symbol. The two pieces of information delivered by 
the detector and the "error" are largely redondant since their relation involves 
no other quantity than the pOR. Consequently, we consider them as insufficient 
outputs of the detection procedure. 

Another important issue is the performance that can be reached by the de- 
tector. Following naturally from the first following study, both issues will be 
adressed theoretically and practically. 



2 New practical error inference: 

for a detection from the Bayes Factor 

In the simple versus composite test Q, when there is no nuisance parame- 
ter or when they are marginalized out thanks to a Bayesian computation, the 
(Bayesian) Likelihood Ratio is: 

LR(^) = '-^^ (3) 

The dependencies on x are dropped in the sequel. For a given a;, it is a function 
of only one random parameter and has therefore a posterior distribution under 
Hi. 
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If the prior tt is proper, it turns out (no reference found) that the Bayes 
Factor ([2| is equal to the posterior mean of the LR: 



p{x\Ho) f^^ 7r*{ri\x)p{x\H,) 



pix\Hi) J P{x\q) 
= [LR(t7)|cc] where 7r*is the posterior of ry (4) 

Uncertainty on the detection could then naturally be given by the posterior 
standard deviation of the LR: 

"LR(J7) =BF±a" (5) 
with a = {var"' [LR{r])\x]f^^ = (FBF(-l) - BF^)^/^ (6) 

where we recognized the Fractional Bayes Factor [Tj 
FBF(b) - ^^^'^ " '^^^ ^ ^^"^'^ 



J df] p{x\t])tt{t]) \jdrip{x\ri)^-K{'q)^ 
= £-'[hR{r^f-^\x] (7) 

except that the FBF has initially been proposed for h € [0, 1) as a partial Bayes 
Factor developped to extend the BF to improper priors. 

However, the BF is used as a statistics to threshold: the underlying distri- 
bution of LR(rj) is explored in a non symmetric fashion and the uncertainty ([5| 
related to the 2nd moment may be inappropriate. 

An alternative is the computation of a confidence interval or simply of the 
cumulative distribution of the variable: 

PLR(C)=Pr-*{LR(r7)<C|a;} (8) 

It turns out that the Posterior distribution of the LR has also already been 
slightly studied. It has been proposed in [10 and extended and applied in 
[HHT]. However its use could be more advocated. 

For a practical use of the suggested tools, we propose to use a single Monte 
Carlo Markov Chain rjl"! '-^ 7r*(T7|a;) for all estimation and detection purposes: 

• the chain LR(r7["l) is straightforwardly computed 

• the FBF ([t]) is computed from an importance sampling procedure (e.g. a 
simple average) from LR(r7["l) for a given b. FBF is used for BF = FBF(O) 
and possibly for ct 

• the PLR ([8| is computed as the empirical cumulative distribution of the 
LR chain. 

• if a signal is detected, rjl"' can be finally used for estimation 
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3 Other Bayesian detectors related to the Pos- 
terior probabiUty of the LR 

The PLR ([s]) and FBF ^ appeared in Sec. [2] as natural statistics for the 
definition of a coherent procedure for detection and have proved easy to compute 
numericaUy. They are further studied here. 

3.1 Properties of the PLR (related to the GLRT) 

First, we make and show two general remarks (no reference found) about the 
posterior density Plr|x of the LR: 

• The minimum of its support is the GLRT: 

min{C:pLR|x(C|a;) >0}-GLRT 

• Under regularity assumptions that get stronger as L (the size of rj) in- 
creases, the function ( — PhR\x{C\x) diverges for ~> GLRT+ . 

In the same Bayesian frame as for the definition of LR (|3]), the (Bayesian) 
Generalized Likelihood Ratio Test (GLRT) is defined for the simple versus com- 
posite test ([T]) by 

GLRT = min LR(r;) (9) 

where £ — Sup(p^|j,) H Sup(7r) in order to take into account the definition 
domain of the likelihood and the constraints of the parameter set. Therefore, 
£ — Sup(7r*) and 

Pr''*{LR(T7) < GLRT|a;} = (10) 

Under regularity assumptions in the neighborhood of the GLRT (reached by 
definition for rj = fluh) have 

PLR\xiC\x) ~> oo when C ^ GLRT+ (11) 

In the following, we drop the conditionality on a;. A usual transform to infer 
the distribution of LR is : — > (LR, f]^) where we note f]i — (772, --^riL)- Its 
Jacobian determinant is |J| = |9LR/977i|. The usual variables transformation 
gives for an open set 

n(C.Ui) 
k=l 

where the are the solutions of (/)(it'^) = (C,iti). 

For L = 1 (?7 is scalar), it gives directly the result: if the function LR : 
77 — >■ C is continuously differentiable, \dLR/ dri\{u) — > as u — ;> argmin(LR(u)). 
So plr(C) ^ 00 as C ^ GLRT. 
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For L > 1, L — 1 integrations are required to marginalize out r)]^. They have 
to be computed for a given ( > GLRT since the Jacobian is not defined at C = 
GLRT and since we assume t/^l is the only solution of LR(t)i^l) — GLRT so 
that the integrand would be positive on a null set only. We show p[2] that if 
locally there exist a > L and (ai, .., a^) G M.^^ such that for all t] close enough 
to rjML 

L 

LR(r,) < GLRT + ^ a,(r, - t^ml)? 
then e~iPr(GLRT < LR < GLRT + e) ^ oo when e ^ 0. 

3.2 Optimal parametrization of PLR and FBF 

In addition to their initial developments motivations, the PLR and FBF are 
interesting to study as detectors because they can be seen as families of tests 
parametrized by two parameters: 

Reject Ho if PLR(Co) > Po 
Reject Ho if FBF(6) < ( 

For PLR, A = {Co, Pa) and for FBF A' = (&, C)- Unlike detectors defined from 
a single threshold, it is possible to optimize each family. We propose to do it 
using the frequentist ROC curve tool, which displays the Probability of good 
Detection (PD) as a function of the Probability of False Alarm (PFA). For the 
PLR, 

PFA(Co,Po) = Pr^'(=^l^°){PLR(Co) > Po} (12) 

In principle, the idea is first to compute PFA(A) and PD(A) for all A = 
(CotPo), then, fix a PFAq, obtain the corresponding {A : PFA(A) = PFAq} 
curve and choose from it Aopt(PFAo) that maximizes PD(A). We propose to do 
it numerically thanks to the practical computation proposed in Sec. |2] From a 
large number of datasets (one set of datasets under Hq and one set of datasets 
under Hi), two matrices made of the LR chains are formed. The trick is that 
PFA, PD and po are asymptotically regularly sampled in the matrices as soon 
as the matrices are reordered. Then, the approximate optimal parameters can 
almost be "read" from the tables. 

4 Application of the detection procedure 

The estimation-detection procedure of Sec. [2] is realistically applied to the 
detection of exoplanets from direct imaging using the future VLT instrument 
SPHERE. 
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Figure 1: Simulated data from C AOS-SPHERE witli a contrast of 10^ between 
the star and tlie planet. Left: a;2(20)''-^ Right: P2{20f -^. 

4.1 Statistical model for exoplanet detection in direct imag- 



To specify the statistics BF etc as pure functions of a;, a statistical model is 
required. A hierarchical Bayesian model precisely related to our context has 
been developped in [13' and is summed up here. 

The T] vector of the hypothesis test ([l]) refers to the exoplanet intensity in 
the different channels. The marginalized prior Tr{r]) has a positive support, is 
proper and approximatively scale invariant. 

The dataset is made of K successive sets of L images, where each image is a 
M X 1 vector ii{k). The x\. = {ii{ky, .., i^iky) are assumed to be conditionally 
independent and described by: 



Ak, the source profiles, are assumed to be known. This first level likelihood 
is marginalized using conjugate priors and leads to an explicit form for p{x\ri) 
where x = {xk}k=i,..,K- 

The Markov chain jyl"! ^ 7r*(?7|a;) necessary to compute PLR and FBF (see 
Sec. [2| is obtained from a slice sampling method HHI . 

4.2 Application of the detection procedure on a realistic 



The simulation of realistic astrophysical datasets is performed by the dedicated 
physical step-by-step Software Package SPHERE 15 developed and used within 
the CAOS environment ^6j. A dataset x is simulated under Hi with a lumi- 
nosity contrast of 10^ between the star and the exoplanet (corresponding to 
an intensity Jy^fi), and another under i7o, obtained from an area adjacent to 
the one under Hi. The data under Hi, of size {K,L,M) = (20,2,425) are 
illustrated on Fig. [l] Note that it is impossible to simulate many datasets. 

The detection procedure described in Sec. [2] and used with the realistic sta- 
tistical model summarized in Sec. |4.1| is finally applied to these two datasets. 



mg 



a^fe Im, S, ?7 TVlm (^fc^ + At, 5^) 



(13) 



dataset 
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The hyperparameters are chosen simply {ly — 2M, Sq = o'^l2M ■■•) or un- 
favourable (mo = \n{1000rifji)). Both chains Ty^"! are made of iV =10^ samples. 

Fig. [2] shows the histograms of the Markov chains resulting from these two 
cases. Under the Hi case, the Bayes Factor ^ seems to indicate with no 
ambiguity a detection: BF — 0.04 < Co for Co = 0.1. The uncertainty ^ gives 
a = 0.34 ("LR = 0.04 ± 0.34") but for the reasons mentionned in Sec. [2) a 
quantile should be more relevant than a moment to infer the uncertainty on a 
detector. The measure PLR(Co) — 0.94 > 0.8 confirms the absence of ambiguity 
of the BF result. Similarly, in the Hq case, the BF test indicates again with no 
ambiguity that there is no exoplanet: "LR ~ 3.7 (± 86)". This is confirmed 
by the quantile PLR(Co) = 0. For a more complete information, the empirical 
posterior distributions of LR(?7l"]) are presented on Fig. [s] and Fig. [4] They 
also illustrate the properties shown in Sec [2] and 3.1 



Finally, estimation can be performed for the data where a signal has been 
detected (ie data simulated under Hi). The posterior distribution is shown on 
Fig. [2] (left). The signal is estimated by the posterior mean and its uncertainty 
by the posterior standard deviation: i) = (6.2 ± 2.8 ;4.6 ± 2. 6). 10"'' for a true 
r7^i = (8;0.5).10-5. 

4.3 Comparison with a practical and totally frequentist 
GLRT 

The proposed procedure is compared to a classical Generalized Likelihood Ratio 
Test (not the "Bayesian" GLRT ([9])). The likelihood used to compute it is the 



first level likelihood (13), except that the covariance matrix is assumed to be 



proportionnal to identity: S = (t^Ilm- Then 



-^rpT. xn&yi^,,a{Y\kP{xk\^J■,(T'^ILM,■n = 0)} 
niax^,cr,r7{llfcP(a;fe|/x, ct-'Ilm, r]}} 
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Figure 3: Histograms of the LR(r7["l) chains, computed from the chains ryt"' 
shown in Fig. M The (Bayesian) GLRT ^ and the BF ^ are indicated (see 
Sec. [2] and 3.1 for the proofs). 




Figure 4: A posteriori empirical cumulative distributions of LR, displayed from 
the chains LR(r7'"l) shown in Fig. pi 
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The analytical maximization of the likelihood under Hi for L = 2 generalizes a 
computation in |17j where L — 1, and leads to: 



GLRT..^^J (15) 
KLM 



where 



where fj^Q = and (/i^fi, ^y^ri) ^^'^ P-ho minimize least square criteria obtained 



from the model ( 13 ) 



The GLRT -contrary to LR(r7'"')- has always a value inferior or equal to 1 
because the hypotheses are nested. Here, ln(GLRT2) = —4350 for the data sim- 
ulated under Hi and ln(GLRT2) = —1300 under Hq. Since it is not numerically 
possible to realistically simulate a large number of datasets, it is impossible to 
relate numerically the threshold of the GLRT to its Probability of False Alarm 



(PFA). The model (13) is not identically distributed, so the classical results 
on the asymptotic distribution of the GLRT neither apply. It is consequently 
difficult to choose the threshold 

In any case, the values of the GLRT2 applied to areas closed but distinct 
from the precedent cases indicate that the GLRT2 discriminates with difficulty 
Hq and Hi. 



Illustration of the FBF and PLR optimiza- 
tions as detectors 



The other interesting property presented in Sec. 3.2 of the PLR ([s]) and FBF ([7| 



is now illustrated on an astrophysical context totally similar to the previous one, 
but the data are now simulated from the statistical model and not the physical 
one, so that a long run performance analysis can be performed. The data are 
simulated from the marginalized likelihood presented in [13] for KLM = 80. 
For simplicity, the data under Hi are characterized by a fixed r/ = rjfji. 

Fig. [5] illustrates the ROC curves obtained for some intuitive parametriza- 
tions (Co = 1 etc) and the optimal ones. We note that: 

• The classical Bayes Factor is uniformly less performant than the other 
FBF and the PLR. For PFA = 0.1, the performances of the PLR overpass 
the ones of the Bayes Factor by 15%. 

• The tests with fixed parametrization have performances very close to the 
optimal ones. It strenghtens their use. 



The bad performances of the GLRT ( 15 ) where E was wrongly assumed 
to be proportionnal to identity are confirmed here: it is equivalent to a 
heads or tails test. 
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Figure 5: ROC curves of the PLR, the FBF and the GLRT2 (Eq. 15). 



6 Conclusion 



In this paper, a coherent and practical detection procedure has been proposed. 
The procedure rehes on the fact that for a simple versus composite test using 
a proper prior the Bayes Factor can be expressed as the posterior mean of the 
Likelihood Ratio. The statistics involved (FBF and PLR) are computable from 
the single t;!"! ~ 7r*(j7|cc) Markov Chain. It has been realistically applied and 
compared to a reasonable alternative and proved satisfactory. Finally, two more 
properties of the PLR -related to the GLRT- have been given, and the PLR and 
FBF families have been studied as optimizable detectors from a ROC curve. 
These results have been applied and show that intuitive parametrizations of 
these tests are close to optimal. 
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